Code
library(tidyverse)
library(gapminder)
collection of package for data manipulation
Tony Duan
July 11, 2023
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
# A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
filter()
select()
select one variable
# A tibble: 6 × 1
country
<fct>
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
5 Afghanistan
6 Afghanistan
select exclude one variable
# A tibble: 6 × 5
continent year lifeExp pop gdpPercap
<fct> <int> <dbl> <int> <dbl>
1 Asia 1952 28.8 8425333 779.
2 Asia 1957 30.3 9240934 821.
3 Asia 1962 32.0 10267083 853.
4 Asia 1967 34.0 11537966 836.
5 Asia 1972 36.1 13079460 740.
6 Asia 1977 38.4 14880372 786.
mutate()
mutate
# A tibble: 6 × 7
country continent year lifeExp pop gdpPercap pop_k
<fct> <fct> <int> <dbl> <int> <dbl> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779. 8425.
2 Afghanistan Asia 1957 30.3 9240934 821. 9241.
3 Afghanistan Asia 1962 32.0 10267083 853. 10267.
4 Afghanistan Asia 1967 34.0 11537966 836. 11538.
5 Afghanistan Asia 1972 36.1 13079460 740. 13079.
6 Afghanistan Asia 1977 38.4 14880372 786. 14880.
transmute()
transmute
group_by()
and summarise()
group by
# A tibble: 5 × 5
continent total_pop count avg_pop sd_pop
<fct> <dbl> <int> <dbl> <dbl>
1 Africa 743832984 52 14304480. 19873013.
2 Americas 796900410 25 31876016. 62032823.
3 Asia 3383285500 33 102523803. 262349716.
4 Europe 568944148 30 18964805. 22748145.
5 Oceania 22241430 2 11120715 10528152.
arrange()
order from small to big
# A tibble: 5 × 2
continent total_pop
<fct> <dbl>
1 Oceania 22241430
2 Europe 568944148
3 Africa 743832984
4 Americas 796900410
5 Asia 3383285500
order from big to small
R for data science Book https://r4ds.had.co.nz/
---
title: "R Package: [tidyverse]"
subtitle: "collection of package for data manipulation"
author: "Tony Duan"
date: "2023-07-11"
categories: [packages]
execute:
warning: false
error: false
format:
html:
code-fold: show
code-tools: true
number-sections: true
code-block-bg: true
code-block-border-left: "#31BAE9"
---
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
{width="427"}
```{r}
library(tidyverse)
library(gapminder)
```
```{r}
data001=gapminder
head(data001)
```
# clean vairble name with `janitor::clean_names()`
```{r}
glimpse(data001)
```
```{r}
data002=data001 %>% janitor::clean_names()
glimpse(data002)
```
# filter data with `filter()`
```{r}
data002=data001 %>% filter(country=='China',year==1997)
data002
```
# select variable with `select()`
select one variable
```{r}
data002=data001 %>% select(country)
data002%>% head()
```
select exclude one variable
```{r}
data002=data001 %>% select(-country)
data002%>% head()
```
# create new variable with `mutate()`
mutate
```{r}
data002=data001 %>% mutate(pop_k=pop/1000)
data002 %>% head()
```
# create new variable and only select the new variable with `transmute()`
transmute
```{r}
data002=data001 %>% transmute(pop_k=pop/1000)
data002 %>% head()
```
# summaries with `group_by()` and `summarise()`
group by
```{r}
data002=data001 %>% filter(year==1997) %>% group_by(continent) %>% summarise(
total_pop=sum(pop) # sum
,count=n() #count
,avg_pop=mean(pop) #mean
,sd_pop=sd(pop) # sd
)
data002 %>% head()
```
# arrange data with `arrange()`
order from small to big
```{r}
data002=data001 %>% filter(year==1997) %>% group_by(continent) %>%
summarise(total_pop=sum(pop)) %>%
arrange(total_pop)
data002 %>% head()
```
order from big to small
```{r}
data002=data001 %>% filter(year==1997) %>% group_by(continent) %>%
summarise(total_pop=sum(pop)) %>%
arrange(desc(total_pop))
data002 %>% head()
```
# Reference
R for data science Book https://r4ds.had.co.nz/